Abstract: Researchers initially have addressed the problem of spam detection as a text classification or categorization problem. However, as spammers’ continue to develop new techniques and the type of email content becomes more disparate, text-based anti-spam approaches alone are not sufficiently enough in preventing spam. In an attempt to defeat the anti-spam development technologies, spammers have recently adopted the image spam trick to make the scrutiny of emails’ body text inefficient. The main idea behind this project is to design a spam detection system. The system will be enabled to analyse the content of emails, in particular the artificially generated image sent as attachment in an email. The system will analyse the image content and classify the embedded image as spam or legitimate hence classify the email accordingly. This experiment results show this approach can get high recognition ratio and reduce the cost of calculation.
Keywords: Spam Filtering, Content Based Filtering, Spam email detection, Machine Learning, Nearest neighbour classifier, Pattern recognition, Data mining Classification, Naïve Bayes.